1,485 research outputs found

    Attention Correctness in Neural Image Captioning

    Full text link
    Attention mechanisms have recently been introduced in deep learning for various tasks in natural language processing and computer vision. But despite their popularity, the "correctness" of the implicitly-learned attention maps has only been assessed qualitatively by visualization of several examples. In this paper we focus on evaluating and improving the correctness of attention in neural image captioning models. Specifically, we propose a quantitative evaluation metric for the consistency between the generated attention maps and human annotations, using recently released datasets with alignment between regions in images and entities in captions. We then propose novel models with different levels of explicit supervision for learning attention maps during training. The supervision can be strong when alignment between regions and caption entities are available, or weak when only object segments and categories are provided. We show on the popular Flickr30k and COCO datasets that introducing supervision of attention maps during training solidly improves both attention correctness and caption quality, showing the promise of making machine perception more human-like.Comment: To appear in AAAI-17. See http://www.cs.jhu.edu/~cxliu/ for supplementary materia

    Few-Shot Image Recognition by Predicting Parameters from Activations

    Full text link
    In this paper, we are interested in the few-shot learning problem. In particular, we focus on a challenging scenario where the number of categories is large and the number of examples per novel category is very limited, e.g. 1, 2, or 3. Motivated by the close relationship between the parameters and the activations in a neural network associated with the same category, we propose a novel method that can adapt a pre-trained neural network to novel categories by directly predicting the parameters from the activations. Zero training is required in adaptation to novel categories, and fast inference is realized by a single forward pass. We evaluate our method by doing few-shot image recognition on the ImageNet dataset, which achieves the state-of-the-art classification accuracy on novel categories by a significant margin while keeping comparable performance on the large-scale categories. We also test our method on the MiniImageNet dataset and it strongly outperforms the previous state-of-the-art methods

    Secure Transmission for Relay Wiretap Channels in the Presence of Spatially Random Eavesdroppers

    Get PDF
    We propose a secure transmission scheme for a relay wiretap channel, where a source communicates with a destination via a decode-and-forward relay in the presence of spatially random-distributed eavesdroppers. We assume that the source is equipped with multiple antennas, whereas the relay, the destination, and the eavesdroppers are equipped with a single antenna each. In the proposed scheme, in addition to information signals, the source transmits artificial noise signals in order to confuse the eavesdroppers. With the target of maximizing the secrecy throughput of the relay wiretap channel, we derive a closed-form expression for the transmission outage probability and an easy-to-compute expression for the secrecy outage probability. Using these expressions, we determine the optimal power allocation factor and wiretap code rates that guarantee the maximum secrecy throughput, while satisfying a secrecy outage probability constraint. Furthermore, we examine the impact of source antenna number on the secrecy throughput, showing that adding extra transmit antennas at the source brings about a significant increase in the secrecy throughput.Comment: 7 pages, 5 figures, accepted by IEEE Globecom 2015 Workshop on Trusted Communications with Physical Layer Securit
    • …
    corecore